Planning with Stochastic Resource Profiles: An Application to Human-Robot Co-habitation

نویسندگان

  • Tathagata Chakraborti
  • Subbarao Kambhampati
چکیده

It is important for robotic agents to be respectful of the intentions of the human members cohabiting an environment and account for conflicts on the shared resources in the environment, in order to be acceptable members of human-robot ecosystems. In this paper we look at how maintaining predictive models of the human cohabitors in the environment can be used to inform the planning process of the robotic agents. We introduce an Integer Programming based planner as a general formulation of the “human-aware” planning problem and show how the proposed formulation can be used to model different behaviors of the robotic agent, showcasing compromise, opportunism or negotiation. Finally, we show how the proposed approach scales with the different parameters involved, and provide empirical evaluations to illustrate the pros and cons associated with the proposed style of planning. In environments where multiple agents are working independently, but utilizing shared resources, it is important for these agents to maintain belief models of other agents so as to act intelligently and prevent conflicts. In cases where some of these agents are humans, as in assistive robots in household environments, these are required (rather than desired) capabilities of robots in order to be “socially acceptable” this has been studied extensively under the umbrella of “human-aware” planning, both in the context of path planning (Sisbot et al. 2007; Kuderer et al. 2012) and in task planning (Cirillo, Karlsson, and Saffiotti 2009; Koeckemann, Pecora, and Karlsson 2014; Cavallo et al. 2014; Tomic, Pecora, and Saffiotti 2014). Probabilistic plan recognition can play an important role in this regard, because by not committing to a plan, that pre-assumes a particular plan for the other agent, it might be possible to minimize suboptimal (in terms of redundant or conflicting actions performed during the execution phase) behavior of the autonomous agent. Here we look at possible ways to minimize such suboptimal behavior by ways of compromise, opportunism or negotiation. There has been previous work (Beaudry, Kabanza, and Michaud 2010; Cirillo, Karlsson, and Saffiotti 2010) on some of the modeling aspects of the Figure 1: Architecture diagram the robot has partial beliefs of the world, which it uses to predict and plan. problem, in terms of planning with uncertainty in resources and constraints. In this paper we provide a unified framework of achieving these behaviors of the autonomous agents, particularly in such scenarios of human robot cohabitation. The general framework of the problem addressed in this work is shown in Figure 1. The autonomous agent, or the robot, is acting (with independent goals) in an environment co-habited with other agents (humans), who are similarly self-interested. The robot has a model of the other agents acting independently in its environment. These models may be partial and hence the robot can only make uncertain predictions on how the world will evolve with time. However, the resources in the environment are limited and are likely to be constrained by the plans of the other agents. The robot thus needs to reason about the future states of the environment in order to make sure that its own plans do not produce conflicting states with respect to the plans of the other agents. With the involvement of humans, however, the problem is more skewed against the robot, because humans would expect a higher priority on their plans robots that produce plans that clash with those of the humans, without any explanation, would be considered incompatible for such an ecosystem. Thus the robot will be expected to follow plans that preserve the human plans, rather than follow a globally optimal plan for itself. This aspect makes the current setting distinct from normal human robot teaming scenarios and produces a numFigure 2: The running example a human commander and a robot involved in an USAR setting, with constrained resources (medkits). ber of its own interesting challenges. How does the robot model the humans’ behavior? How does it plan to avoid friction with the human plans? If it is possible to communicate, how does it plan to negotiate and refine plans? These are the questions that we seek to address in this work. Our approach models human beliefs and defines resource profiles as abstract representations of the plans predicted on the basis of these beliefs of the human agents. The robot updates its own beliefs about the world upon receiving every new observation from its environment, and passes on the resultant profiles onto its own planner as shown in Figure 1. For the planning module, we introduce an IP-based planner that minimizes the overlap between these resource profiles and those produced by the robot’s own plan in order to maintain least conflicts with the predicted human tasks in the future. 1 Planning with Resource Profiles We will now go into details about each of the modules shown in Figure 1. We will be using a similar setting as the one described in (Talamadupula et al. 2014) (shown in Figure 2) as the running example throughout this discussion. The setting involves a commander CommX and a robot in a USAR (Urban Search and Rescue) scenario. The shared resources here are the two medkits some of the plans the commander can execute will lock the use of and/or change the position of these medkits, so that from the set of probable plans of the commander we can extract a probability distribution over the usage (or even the position) of the medkit over time based on the fraction of plans that conform to these facts. These resource availability profiles provide a way for the agents to minimize conflicts with the other agents. Before going into details about the planner that achieves this, we will first look at how the agents are modeled and how these profiles are computed in the next section. 1.1 The Belief Modeling Component The notion of modeling beliefs introduced by the authors in (Talamadupula et al. 2014) is adopted in this work and described briefly here. Beliefs about state are defined in terms of predicates bel(α, φ), where α is an agent with belief φ = true. Goals are defined by predicates goal(α, φ), where agent α has a goal φ. The set of all beliefs that the robot ascribes to α together represents the perspective for the robot of α. This is obtained by a belief model Belα of agent α, defined as { φ | bel(α, φ) ∈ Belself }, where Belself are the first-order beliefs of the robot (e.g., bel(self, at(self, room1))). The set of goals ascribed to α is similarly described by {goal(α, φ)|goal(α, φ) ∈ Belself}. Next, we turn our attention to the domain model Dα of the agent α that is used in the planning process. Formally, a planning problem Π = 〈Dα, πα〉 consists of the domain model Dα and the problem instance πα. The domain model of α is defined as Dα = 〈Tα, Vα, Sα, Aα〉, where Tα is a set of object types; Vα is a set of variables that describe objects that belong to Tα; Sα is a set of named first-order logical predicates over the variables Vα that describe the state; and Aα is a set of operators available to the agent. The action models a ∈ Aα are represented as a = 〈N,C,P,E〉where N denotes the name of that action; C is the cost of that action; P is the list of pre-conditions that must hold for the action a to be applicable; and E = {eff(a), eff−(a)} is a list of predicates in Sα that indicates the effects of applying the action. The transition function δ(·) determines the next state after the application of action a in state s as δ(a, s) = (s \ eff−(a)) ∪ eff(a), s ⊆ SR. For this work, we assume that the action models available to an agent are completely known to all the other agents in the scenario; that is, we rule out the possibility of beliefs on the models of other agents. The belief model, in conjunction with beliefs about the goals / intentions of another agent, will allow the robot to instantiate a planning problem πα = 〈Oα, Iα,Gα〉, where Oα is a set of objects of type t ∈ Tα; Iα is the initial state of the world, and Gα is a set of goals, which are both sets of the predicates from Sα initialized with objects from Oα. First, the initial state Iα is populated by all of the robot’s initial beliefs about the agent α, i.e. Iα = {φ | bel(α, φ) ∈ Belrobot}. Similarly, the goal is set to Gα = {φ | goal(α, φ) ∈ Belrobot}. Finally, the set of objects Oα consists of all the objects that are mentioned in either the initial state, or the goal description: Oα = {o | o ∈ (φ | φ ∈ (Iα ∪Gα))}. This planning problem instance (though not directly used in the robot’s planning process) enables the goal recognition component to solve the compiled problem instances. 1.2 The Goal Recognition Component It is unlikely for the robot to be aware of the goals of other humans in its environment completely, but it can be proactive in updating its beliefs incrementally based on observations of what the other agents are doing. To accommodate this, the robot’s current belief of α’s goal, Gα, is extended to a hypothesis goal set Ψα. The computation of this goal set can be done using planning graph (Blum and Furst 1995) methods. In the worst case, Ψα corresponds to all possible goals in the final level of the converged planning graph. Having further (domain-dependent) knowledge (e.g. in our scenario, information that CommX is only interested in triagerelated goals) can prune some of these goals by removing the goal conditions that are not typed on the triage variable. At this point we refer to the work of Ramirez and Geffner who Figure 3: Different types of profiles corresponding to the two recognized plans. in (Ramrez and Geffner 2010) provided a technique to compile the problem of goal recognition into a classical planning problem. Given a sequence of observations θ, the probability distribution Θ over G ∈ Ψα is recomputed by using a Bayesian update P (G|θ) ∝ P (θ|G), where the prior is approximated by the function P (θ|G) = 1/(1 + e−β∆(G,θ)) where ∆(G, θ) = Cp(G− θ) − Cp(G+ θ). Thus, solving two planning problems, with goalsG−θ andG+θ, gives the posterior distribution Θ over possible goals of α. We then compute the optimal plans for the goals in Ψα, which are used to compute the resource profiles described in the next section. Note here that one immediate advantage of using this specific goal recognition approach is that while computing the plan to a particular goalGwe can reuse the compiled problem instance with the goal G+ θ to ensure that the predicted plan conforms to the existing observations. 1.3 Resources and Resource Profiles As we discussed previously, since the plans of the agents are in parallel execution, the uncertainty introduced by the commander’s actions cannot be mapped directly between the commander’s final state and the robot’s initial state. However, given the commander’s possible plans recognized by the robot, we can extract information about at what steps, or at what points of time, the shared resources in the environment are likely to be locked by the commander (given that we know what these shared resources are). This information can be represented by resource usage profiles that capture the expected (over all the recognized plans) variation of probability of usage or availability over time. The robot can, in turn, use this information to make sure that the profile imposed by its own plan has minimal conflicts with those of the commander’s. Formally, a profile is defined as a mapping from time step T to a real number between 0 and 1, and is represented by a set of tuples as follows G : N→ [0, 1] ≡ {(t, g) : t ∈ N, g ∈ [0, 1], such that G(t) = g at time step t}. The idea of the resource profiles can be handled at two levels of abstraction. Going back to our running example, shared resources that can come under conflict are the two (locatable typed objects) medkits, and the profiles over the medkits can be over both usage and location, as shown in Figure 3. These different types of profiles can be used (possibly in conjunction if needed) for different purposes. For example, just the usage profile shown on top is more helpful in identifying when to use the specific resource, while the resource when bound with the location specific groundings, as shown at the bottom can lead to more complicated higher order reasoning (e.g. the robot can decide to wait for the commander’s plans to be over, as he inadvertently brings the medkit closer to it it with high probability as a result of his own plans). We will look at this again in Section 2. Let the domain model of the robot be DR = 〈TR, VR, SR, AR〉 with the action models a = 〈N,C,P,E〉 defined in the same way as described in Section 1.1. Also, let Λ ⊆ VR be the set of shared resources and for each λ ∈ Λ we have a set of predicates f ⊆ SR that are influenced by λ, and let Γ : Λ → ξ be a function that maps the resource variables to the set of predicates ξ = ∪λf they influence. Without any external knowledge of the environment, we can set Λ = Vα ∩ VR and ξ = Sα ∩ SR, though in most cases these sets are much smaller. In the following discussion, we will look at how the knowledge from the hypothesis goal set can be modeled in terms of resource availability graphs for each of the constrained resources λ ∈ Λ. Consider the set of plans Ψα containing optimal plans corresponding to each goal in the hypothesis goal set, i.e. Ψα = {πG = 〈a1, a2, . . . at〉 | G = δ(at, . . . δ(a2, δ(a1, Iα))) ∀ G ∈ Ψα and ai ∈ Aα∀i} and let l(π) be the likelihood of the plan π modeled on the goal likelihood distribution ∀ G ∈ Ψα, p(G) ∼ Θ as l(πG) = c|πG| × p(G), where c is a normalization constant. At each time step t, a plan π ∈ Ψα may lock one or more of the resources λ. Each plan thus provides a profile of usage of a resource with respect to the time step t as G π : N → {0, 1} = {(t, g) | t ∈ [1, |π|] and g = 1 if λ is locked by π at step t, 0 otherwise} such that G π(t) = g ∀ (t, g) ∈ G π . The resultant usage profile of a resource λ due to all the plans in Ψα is obtained by summing over (weighted by the individual likelihoods) all the individual profiles as G : N → [0, 1] = {(t, g) | t = 1, 2, . . . ,max(|π|) and g ∝ 1 |Ψα | ∑ π G π(t)× l(π) ∀ π ∈ Ψα}. Similarly, we can define profiles over the actual groundings of a variable (shown in the lower part of Figure 3) as Gfλ π = {(t, g) | t ∈ [1, |π|] and f = 1 at step t of plan π, 0 otherwise}, and the resultant usage profile due to all the plans in Ψα is obtained as before as G λ = {(t, g) | t = 1, 2, . . . ,max(|π|) and g ∝ 1 |Ψα | ∑ π G λ π (t) × l(π) ∀ π ∈ Ψα}. These profiles are helpful when actions in the robot’s domain are conditioned on these variables, and the values of these variables are conditioned on the plans of the other agents in the environment currently under execution. One important aspect of this formulation that should be noted here is that the notion of “resources” is described here in terms of the subset of the common predicates in the domain of the agents (ξ ⊆ Sα ∩ SR) and can thus be used as a generalized definition to model different types of conflict between the plans between two agents. In as much as these predicates are descriptions (possibly instantiated) of the typed variables in the domain and actually refer to the physical resources in the environment that might be shared by the agents, we will stick to this nomenclature of calling them “resources”. We will now look at how an autonomous agent can use these resource profiles to minimize conflicts during plan execution with other agents in its environment. 1.4 Conflict Minimization The planning problem of the robot given by Π = 〈DR, πR,Λ, {G | ∀λ ∈ Λ}, {G λ | ∀f ∈ Γ(λ),∀λ ∈ Λ}〉 consists of the domain model DR and the problem instance πR = 〈OR, IR,GR〉 similar to that described in section 1.3, and also the constrained resources and all the profiles corresponding to them. This is because the planning process must take into account both goals of achievement as also conflict of resource usages as described by the profiles. Traditional planners provide no direct way to handle such profiles within the planning process. Note here that since the execution of the plans of the agents is occurring in parallel, the uncertainty is evolving at the time of execution, and hence the uncertainty cannot be captured from the goal states of the recognized plans alone, and consequently cannot be simply compiled away to the initial state uncertainty for the robot and solved as a conformant plan. Similarly, the problem does not directly compile into action costs in a metric planning instance because the profiles themselves are varying with time. Thus we need a planner that can handle these resource constraints that are both stochastic and non-stationary due to the uncertainty in the environment. To this end we introduce the following IP-based planner (partly following the technique for IP encoding for state space planning outlined in (Vossen et al. 1999)) as an elegant way to sum over and minimize overlaps in profiles during the plan generation process. The following formulation finds such T-step plans in case of nondurative or instantaneous actions. For action a ∈ AR at step t we have an action variable: xa,t = { 1, if action a is executed in step t 0, otherwise; ∀a ∈ AR, t ∈ {1, 2, . . . , T} Also, for every proposition f at step t a binary state variable is introduced as follows: yf,t = { 1, if proposition is true in plan step t 0, otherwise; ∀f ∈ SR, t ∈ {0, 1, . . . , T} Note here that the plan being computed for the robot introduces a new resource consumption profile itself, and thus one optimizing criterion would be to minimize the overlap between the usage profile due to the computed plan with those established by the predicted plans of the other agents in the environment. Let us introduce a new variable to model the resource usage graph imposed by the robot as follows: gf,t = { 1, if f ∈ ξ is locked at plan step t 0, otherwise; ∀f ∈ ξ, t ∈ {0, 1, . . . , T} For every resource λ ∈ Λ, the actions in the domain of the robot are divided into three sets Ω+f = {a ∈ AR such that xa,t = 1 =⇒ yf,t = 1}, Ωf = {a ∈ AR such that xa,t = 1 =⇒ yf,t = 0} and Ωf = AR \ (Ω+f ∪ Ω − f ). These then specify respectively those actions in the domain that lock, free up, or do not affect the current use of a particular resource, and are used to calculate gf,t as part of the IP. Further, we introduce a variable hf,t to track preconditions required by actions in the generated plan that are conditioned on the plans of the other agents (e.g. position of the medkits are changing, and the action pickup is conditioned on it) as follows: hf,t = { 1, if f ∈ Pa and xa,t+1 = 1 0, otherwise; ∀f ∈ ξ, t ∈ {0, 1, . . . , T − 1} Then the solution to the IP should ensure that the robot only uses these resources when they are in fact most expected to be available (as obtained by maximizing the overlap between hf,t and G λ ). These act like demand profiles from the perspective of the robot. We also add a new “no-operation” action AR ← AR ∪ aφ such that aφ = 〈N,C,P,E〉 where N = NOOP, C = 0, P = {} and E = {}. The IP formulation is given by: min k1 ∑ a∈AR ∑ t∈{1,2,...,T}Ca × xa,t +k2 ∑

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Planning for Serendipity - Altruism in Human-Robot Cohabitation

Recently there has been a lot of focus on human robot co-habitation issues that are usually orthogonal to aspects of human-robot teaming; e.g. on producing socially acceptable behaviors of robots and de-conflicting plans of robots and humans in shared environments. However, an interesting offshoot of these settings that has largely been overlooked is the problem of planning for serendipity i.e....

متن کامل

Application of Strategic Planning Process to Reduce the Risk of Drought Impact on Isfahan's Habitation by Using AIDA

Aims & Backgrounds: The most important consequences of climate change are the occurrence of drought in different parts of the world (especially Iran), which has encountered urban and rural settlements with various crises. One of the ways to deal with this phenomenon is to plan a Disaster (Drought) Risk Reduction plan that is part of a crisis-specific agenda for urban planning and management. Th...

متن کامل

Neuro-Optimizer: A New Artificial Intelligent Optimization Tool and Its Application for Robot Optimal Controller Design

The main objective of this paper is to introduce a new intelligent optimization technique that uses a predictioncorrectionstrategy supported by a recurrent neural network for finding a near optimal solution of a givenobjective function. Recently there have been attempts for using artificial neural networks (ANNs) in optimizationproblems and some types of ANNs such as Hopfield network and Boltzm...

متن کامل

An Integrated Human Resource Planning Framework for Project-based Organizations in Oil and Gas Industry

The complexities of the oil industry, combined project-based organizations’ complexities, have led the traditional planning of HR being failed. The success of these organizations is based on integrative human resource planning. To this end, the purpose of this study was to determine the factors and components of human resource planning in oil and gas project-based organizations and providing an...

متن کامل

Path Planning and Control of an Industrial Robot Used for Opening Tap Hole of an Electric Arc Furnace

The electric arc furnace (EAF) is one of the popular methods of steel production from steel scraps. The plasma arc is used in EAF to generate heat for melting scarp or direct reduced iron (DRI). The liquid metal is drained from the EAF through the tap hole. Nowadays, it is critical to use Automated/robotic tools for opening the tap hole with oxygen lancing. Because many workers have been blinde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015